Goto

Collaborating Authors

 adjoint method




DoResidualNeuralNetworksdiscretizeNeural OrdinaryDifferentialEquations?

Neural Information Processing Systems

Neural ODEs also provide atheoretical framework to study deep learning models from the continuous viewpoint, using the arsenal of ODE theory [40, 25, 41]. Importantly, they can also be seen as the continuous analog of ResNets.





Do Residual Neural Networks discretize Neural Ordinary Differential Equations?

Neural Information Processing Systems

Neural Ordinary Differential Equations (Neural ODEs) are the continuous analog of Residual Neural Networks (ResNets). We investigate whether the discrete dynamics defined by a ResNet are close to the continuous one of a Neural ODE. We first quantify the distance between the ResNet's hidden state trajectory and the solution of its corresponding Neural ODE. Our bound is tight and, on the negative side, does not go to $0$ with depth $N$ if the residual functions are not smooth with depth. On the positive side, we show that this smoothness is preserved by gradient descent for a ResNet with linear residual functions and small enough initial loss. It ensures an implicit regularization towards a limit Neural ODE at rate $\frac1N$, uniformly with depth and optimization time.


Symplectic Adjoint Method for Exact Gradient of Neural ODE with Minimal Memory

Neural Information Processing Systems

A neural network model of a differential equation, namely neural ODE, has enabled the learning of continuous-time dynamical systems and probabilistic distributions with high accuracy. The neural ODE uses the same network repeatedly during a numerical integration. The memory consumption of the backpropagation algorithm is proportional to the number of uses times the network size. This is true even if a checkpointing scheme divides the computation graph into sub-graphs.